The Autshumato TMX Integrator is a utility application for updating and integrating translation memories, created by the Autshumato ITE, over a network.
This means that all the TMX files (Translation Memory files) created by you and the other users on your network will be synchronized for each language pair. You will therefore have access to the Translation Memories of other users working on the same language pair and it will be integrated into your Autshumato ITE. This TMX Integrator will further filter out duplicate entries and export conflicts (see the section below) to ensure that all users on the network have access to standardised material.
Copyright (C) 2010 Centre for Text Technology (CTexT®), North-West University and Department of Arts and Culture, Government of South Africa.
Home page: http://www.nwu.co.za/ctext
Project page: http://sourceforge.net/projects/autshumatotmxin/
The Autshumato TMX Integrator is released under the TMate Open Source License and is free to download and can be used by anyone.
The license is available in the /doc/License.txt document. It can also be viewed at http://svnkit.com/license.html.
The source files of the SVNKit library can be obtained from http://svnkit.com/.
Download and extract the Autshumato.TMX.Integrator.X.X.X.zip archive. The latest version can be found on http://autshumato.sourceforge.net/Resources.html.
The application can be run using any of the following three methods:
Enter the following command into a command prompt/terminal, whilst in the Autshumato.TMX.Integrator directory (the directory the application was extracted to):
java -jar Autshumato.TMX.Integrator.jar
Enter the following command into a command prompt/terminal, whilst in the Autshumato.TMX.Integrator directory (the directory the application was extracted to):
java -jar Autshumato.TMX.Integrator.jar console
If your platform supports it, the application can be run using the Java(TM) Platform SE binary:
- Right click on the Autshumato.TMX.Integrator.jar file, which can be found in the directory the application was extracted to.
- Select Open With.
- Select Java(TM) Platform SE binary.
This will open the program in GUI mode.
Run the program as explained in the previous section. The graphic user interface (GUI) will be shown as depicted in the picture below.
Settings:
First of all specify and confirm your settings:
The username (with accompanying password) and server address should have been provided to you by the person responsible for the Autshumato TMX Integrator server setup. Refer to the "doc/TMX Integrator Server setup (openSUSE).odt" document on how to set up a server for the application.
Confirm that the Project Path points to the directory of the Autshumato ITE project containing the translation memories you want to update. Press the "Save Settings" button to save the settings.
From now on the Save Settings button will be disabled until a setting has changed. Only when a setting is changed will the button become available again.
*Note: For the program to realise that a setting has been changed, you have to move the cursor out of the current setting field. This means that once you have changed a setting, you will only be able to save it after you have placed the cursor somewhere else.
Update:
To start the translation memory update click on the "Update" button. The progress of the update will be displayed in the progress panel (shown in the picture above).
Wait for the update to complete. If you click on the "Cancel" button before the process is completed, it will be cancelled and your translation memories will not be updated.
The update process consists of three main components: Compounder, Exporter and Updater. The operation of these components will now be set out. Note that you do not have to start each of the components individually; they will run in succession when you run the TMX Integrator.
Compounder
1. Finds newly created TM files. These have to adhere to the naming convention. The application lists the number of new TM files found.
2. The files are inspected to find conflicts and duplicates between the different files.
3. The Compounder process information is displayed in the progress panel. This information lists the number of new TM files found, number of compounded outputs and the number of files that failed the inspections.
Exporter
4. This stage will export the compounded files to the Output folder.
5. Two copies of the compounded file are written to create translation memories with each language as both the source and target languages. If a duplicate exists, the application will not overwrite the file. This also means that the TMX Integrator has already updated the translation memories today, which will result in the Exporter reporting a failure.
Updater
6. Start the updater and initialise the SVN connection protocol. A password will be shown as depicted below. Enter the password you received with your username. The password has to be entered every time you update to ensure maximum security.
7. The output directory is now updated with all the newly added TM files as updated by other users. It is considered normal operation when this step reports a failure the first time the TMX Integrator is run. This is because there is no link yet between the current working copy (the folder with the latest repository) and the output folder. This link is established during the update process, and therefore you will not lose any data.
8. The newly compounded TM files are now added to the repository where all the TM files are stored.
Finish
9. The generated TM files are only removed once all three the processes described above have been completed successfully. This ensures that no data is lost if the program is interrupted during the process.
10. The progress panel will indicate the success of the integration. The table below gives information about the status:
Case: |
Explanation: |
All the processes completed successfully... |
All went as expected and your translation memories are now up to date. |
Compounder => Failed |
If the Compounder fails the Exporter will fail as well. This means that the path to the input directory is incorrect and the program could not find the generated TM files. The Exporter will fail because there are no files to export. The Updater will however update your translation memories with the latest on the server. |
Compounder => Succeeded |
The Exporter failed, which is usually because the files already exist in the output folder. This means that the TMX Integrator has already updated the translation memories today. This does not however stop the program from updating your translation memories with the newest files from the server. |
Compounder => Succeeded |
The Repository path was incorrect, recheck your settings and try again. |
| Compounder => Skipped Exporter => Skipped Updater => Succeeded |
This means that the TMX Integrator has already updated the translation memories today. This does not however stop the program from updating your translation memories with the newest files from the server. |
The console has the same functionality as the GUI, except that there is no interface. Once you are sure that your settings are correct, the console mode can be used to speed up the updating process.
In console mode you don't have the option of changing your settings once they have been saved. To change your settings, delete the TMXIntegrator.prefs file in the application directory and run the program again in console mode. Alternatively, you can manually change your settings by editing the aforementioned file.
The table below explains the commands used and the output given in console mode:
Autshumato TMX Integrator by CTexT (r) |
Program information. |
Settings => Reading the settings... |
The application tries to read the settings. If none are found, as in the first run, default settings will be created following confirmation of the settings. |
Please enter the following values needed to run the application. |
Here you confirm and enter the program settings. Each setting is presented with the default value. If the default value is correct, simply press Enter. Otherwise just type the correct value for the setting and then press Enter. |
Settings => Writing settings... |
Writes the settings and gives confirmation of the success. If the settings could not be written (stored), you don't have sufficient access to the folder containing the application. |
Compounder => Starting Compounder... |
The Compounder searches for new TM files, parses them and then compounds them into a single TM file per translation pair. This process also identifies duplicates and conflicts which are removed and exported respectively. Information is given on the number of new TM files found, the number of compounded files obtained from parsing the TM files, and the number of files that could not be read. |
Exporter => Starting Exporter... |
The exporter starts the writing of the compounded files to the /tm directory. |
Updater => Starting Updater... |
The updater initialises and then asks for your password. Enter your password for the repository here and press Enter. The output directory is now updated with all the newly added TM files added by other users. It is considered normal operation when this step reports a failure the first time the TMX Integrator is run. This is because there is no link yet between the current working copy (the folder with the latest repository) and the output folder. This link is established during the update process, and therefore you will not lose any data. It continues to add your TM files to the repository and updates your /tm/ directory with the latest TM files on the server. |
Compounder => Removing:ENG.AFR.TM.201007161146.tmx |
If the update has been completed successfully the compounder removes the separate translation memories from the input directory (/AutshumatoITE/omegat/omegat/). The compounder will not remove your TM files if they have not yet been uploaded to the server. |
All the processes completed successfully... |
The success of the completed process is shown. Refer to the GUI section (no. 10 ) for the different instances of completion and what each means. |
Conflicts are extracted to a text document adhering to the naming convention: SourceLanguage.TargetLanguage.Username.Conflicts.Date.txt. Ex: "ENG.AFR.woerie.Conflicts.2010-07-29.txt".
In this document all the conflicts (multiple occurrences of duplicate translation memories that differ slightly) are listed beneath each other separated by a blank line. The first sentence is in the source language with the following sentence being in the target language. A blank line separates the language pair and the following pair(s) is(/are) the conflict(s) found. Here is an example of such a document:
A single word that is in bold. 'n Enkel woord wat vetgedruk is. A single word that is in bold. A enkele word wat in vet is. Only one italic word. Slegs een kursiewe woord. Only one italic word. Only een italic word. |
Here you can see that there are two conflicts, each with two language pairs. Future work includes adding a function to import a document with resolved conflicts and export it as a TM.